Errors in aerial survey count data: Identifying pitfalls and solutions

Abstract Accurate estimates of animal abundance are essential for guiding effective management, and poor survey data can produce misleading inferences. Aerial surveys are an efficient survey platform, capable of collecting wildlife data across large spatial extents in short timeframes. However, these surveys can yield unreliable data if not carefully executed. Despite a long history of aerial survey use in ecological research, problems common to aerial surveys have not yet been adequately resolved. Through an extensive review of the aerial survey literature over the last 50 years, we evaluated how common problems encountered in the data (including nondetection, counting error, and species misidentification) can manifest, the potential difficulties conferred, and the history of how these challenges have been addressed. Additionally, we used a double‐observer case study focused on waterbird data collected via aerial surveys and an online group (flock) counting quiz to explore the potential extent of each challenge and possible resolutions. We found that nearly three quarters of the aerial survey methodology literature focused on accounting for nondetection errors, while issues of counting error and misidentification were less commonly addressed. Through our case study, we demonstrated how these challenges can prove problematic by detailing the extent and magnitude of potential errors. Using our online quiz, we showed that aerial observers typically undercount group size and that the magnitude of counting errors increases with group size. Our results illustrate how each issue can act to bias inferences, highlighting the importance of considering individual methods for mitigating potential problems separately during survey design and analysis. We synthesized the information gained from our analyses to evaluate strategies for overcoming the challenges of using aerial survey data to estimate wildlife abundance, such as digital data collection methods, pooling species records by family, and ordinal modeling using binned data. Recognizing conditions that can lead to data collection errors and having reasonable solutions for addressing errors can allow researchers to allocate resources effectively to mitigate the most significant challenges for obtaining reliable aerial survey data.

history of how these challenges have been addressed. Additionally, we used a doubleobserver case study focused on waterbird data collected via aerial surveys and an online group (flock) counting quiz to explore the potential extent of each challenge and possible resolutions. We found that nearly three quarters of the aerial survey methodology literature focused on accounting for nondetection errors, while issues of counting error and misidentification were less commonly addressed. Through our case study, we demonstrated how these challenges can prove problematic by detailing the extent and magnitude of potential errors. Using our online quiz, we showed that aerial observers typically undercount group size and that the magnitude of counting errors increases with group size. Our results illustrate how each issue can act to bias inferences, highlighting the importance of considering individual methods for mitigating potential problems separately during survey design and analysis. We synthesized the information gained from our analyses to evaluate strategies for overcoming the challenges of using aerial survey data to estimate wildlife abundance, such as digital data collection methods, pooling species records by family, and ordinal modeling using binned data.
Recognizing conditions that can lead to data collection errors and having reasonable solutions for addressing errors can allow researchers to allocate resources effectively to mitigate the most significant challenges for obtaining reliable aerial survey data.

| INTRODUC TI ON
Reliable estimates of wildlife abundance are imperative for understanding how environmental variables influence population and community dynamics, assessing trends across time and space, and guiding conservation and management decisions (Williams et al., 2002). Most estimates of wildlife abundance are derived from surveys that collect count data on target species (Elphick, 2008). These surveys are typically designed to yield counts of the species within predefined sampling units for a fixed amount of sampling effort (e.g., observation time, travel speed) to make inferences on abundance across a study region. Differing sampling designs, methods, and analysis techniques for count-based surveys can vary in their ability to yield accurate and precise estimates of abundance. Poorly conducted surveys can produce data that obscure animal-environment relationships or introduce biases into inferences (Conroy et al., 2008).
For species that occur at low densities or across large spatial areas, aerial surveys are often the most efficient platform to collect observational count data (Caughley, 1977;Parker et al., 2010).
Aerial surveys typically consist of flight transects in which observers count individuals of the target species along a transect line or strip or within the boundaries of a sampling plot from a fixed-wing aircraft or helicopter (Caughley, 1977;Jolly, 1969). Aerial surveys have a long history in ecological research, starting with censuses of North American ungulate populations in rugged and remote terrain in the 1940s (Buechner et al., 1951;Hunter & Yeager, 1949). Researchers have long recognized the distinct advantages of aerial surveys, including the ability to rapidly collect data across large spatial extents (Keeping et al., 2018;Lee & Bond, 2016). Comparable ground-based surveys can take weeks to cover the same area that an aerial survey can cover in a matter of days (Keeping et al., 2018). As such, systematic aerial survey designs can be more cost-effective than ground-based surveys despite aerial surveys being considerably more expensive per unit time (Keeping et al., 2018;Khaemba et al., 2001). In remote environments or rugged terrain, wildlife monitoring is often only feasible with aerial surveys. Ground-based terrestrial surveys are typically limited to areas with road systems or areas that can be safely traversed by foot (Jachmann, 1991), while vessel-based surveys of marine and aquatic environments are slower-paced and best suited for remote regions far from land (Briggs et al., 1985). In many situations, aerial surveys are the preferred-and sometimes only-method for data collection on wide-ranging species, including those that occupy remote environments (Conn et al., 2013), are highly mobile, or are difficult to count from the ground because of body size, coloration, or cryptic behaviors (Greene et al., 2017). In addition, data from aerial surveys may be summarized into a meaningful index of abundance for tracking changes in species' populations and distributions over time (Amundson et al., 2019;Chirima et al., 2012;Finch et al., 2021;Obbard et al., 2018). However, such indices are subject to biases, particularly if surveys are not standardized and/or errors in data collection are not constant through time.
While aerial surveys offer many benefits, the method also presents challenges for high quality inferences on species abundance.
Mistakes resulting from imperfect observer detection during sampling can introduce errors into the data. As with other survey types, common manifestations of imperfect detection in aerial surveys include: nondetection (failure to detect an individual or group even though it is present), counting error (inaccurate enumeration of group size), and species misidentification (incorrectly identifying the species of an individual). Nondetection errors occur because an individual that is available to be seen is missed or because an individual is unavailable for detection (e.g., temporarily outside of the survey unit, under vegetation or water and not exposed to sampling; Kéry & Schmidt, 2008). For example, ungulates, such as mule deer (Odocoileus hemionus), can be difficult to detect with aerial surveys in certain cover types and vegetation density (Zabransky et al., 2016), which can result in a failure to record all individuals on a survey transect. Counting errors can result in observers either overor under-recording the true number of individuals on a transect.
Counting errors may also occur as a product of species behavior or the survey platform itself (e.g., fixed wing versus helicopter surveys). Many species, including mid-sized marine mammals, aggregate in large numbers and are highly mobile, making it difficult to accurately enumerate group sizes from fast moving aircraft (Gerrodette et al., 2019). Counting errors are often treated as a failure to detect to individuals (i.e., as a nondetection) and common methods for estimating nondetection (e.g., through detection probabilities) can address minor counting error issues. However, such methods cannot accommodate severe counting errors, such as those that might occur when large groups are encountered. Species misidentification can be a bi-directional issue if a survey focuses on multiple species, resulting in an over-count of one species and an undercount of another. Although observers may be able to detect small-bodied animals, such as many waterbird species, they may be difficult, or nearly impossible, to correctly identify (Johnston et al., 2015) due to the speed of the aircraft and distance from the observer.
In this paper, we provide an overview of the current challenges to estimating species' abundance using aerial survey data. We reviewed the literature on aerial wildlife survey methods over the last 50 years to examine how each major issue manifests across species and environments. Several challenges of using aerial K E Y W O R D S abundance, aerial survey, count data, counting error, imperfect detection, nondetection, species misidentification, study design

T A X O N O M Y C L A S S I F I C A T I O N
Applied ecology; Conservation ecology survey data have not been adequately discussed in the literature despite their persistence in aerial survey data, likely because in many cases, there is no obvious approach to adequately address these issues. Often, issues such as counting error and misidentification are ignored during analyses of count data, either because researchers do not recognize they are present, cannot estimate the magnitude of errors, or they are unable to account for the errors directly during analysis (Clement et al., 2017). Thus, in addition to our literature review, we highlight how aerial survey challenges can manifest using a case study of waterbird aerial surveys in the Gulf of Mexico and an online quiz of aerial observers.
The case study data come from an aerial survey that implemented a double-observer method and are therefore ideal to investigate both how errors can arise in aerial count data and also how they might be addressed through data analysis. Additionally, issues of misidentification and counting error are prevalent in waterbird data due to frequent aggregations of multispecies groups. The online quiz further highlights this issue of counting error, particularly how the magnitude of observer counting error changes as group size increases. The purpose of our review is to provide clarity on the possible errors that can be introduced in aerial survey data but are often ignored, guide researchers to reasonable approaches to ameliorate ongoing issues, and identify areas for future research.

| Literature review
We searched Web of Science and Google Scholar using the key words: "aerial survey*," "aerial wildlife survey*," "aerial survey issue*," "aerial survey error*," and "aerial survey method*." We limited our search to peer-reviewed articles published between 1970 and 2020.
This cutoff ensured a large sample size through time while also excluding the earliest papers describing methods and technologies that are no longer relevant. Our inclusion criteria required that the article: (1) contain aerial survey methodology for in-flight observer surveys, and (2) discuss the implications of the methodology on the accuracy or precision of count data on subsequent inferences. We did not include papers that only reported results from aerial survey work. In most cases, included papers were methods-focused, generally on specific aspects of aerial survey design and implementation.
We inspected the titles and abstracts of all articles of the first fifty results returned by each key word (n = 5) from the two search engines and discarded articles that did not fit the inclusion criteria. We read all papers that passed this initial inspection and further refined our collection based on the inclusion criteria. We also searched the literature cited of all articles that passed our inspection to ensure we did not overlook any critical literature. For each of the articles that met the inclusion criteria, we identified the major issues encountered and categorized the type of issue. We also identified the system and taxa in which the study was conducted, aircraft type, and the sampling style or design (line transect, strip transect, systematic sampling, double observer, mark-recapture, distance sampling; Appendix S1). Because our review is a synthesis of the relevant literature instead of a systematic review, all quantitative metrics from our literature review reported herein should be considered indicative of general trends within aerial survey literature.
Issues of imperfect detection are conflated in the literature because most models used to estimate abundance are unable to simultaneously parse multiple sources of observation error, such as unobserved individuals versus misidentified and miscounted individuals (but see Clement et al., 2017 for an interesting exception). For the purposes of this review, we distinguish nondetection, counting error, and species misidentification as distinct issues. Developing effective mitigation strategies for aerial survey methods requires understanding the different potential sources of error. Left uncorrected, these various detection errors may act differentially to bias count data.  (Certain & Bretagnolle, 2008). For our analyses, we focus on aerial waterbird surveys conducted in winter and summer 2018, winter 2019, and winter 2020.

| Waterbirds case study
To examine detection errors, data were collected with a double-observer protocol where same-side front-and rear-seat observers independently recorded count and species identification records of all waterbirds that they observed in the observation strip (flight transect out to 200 m). Two experienced observers, a pilot-biologist and a crew member, were always stationed in the front seats of the plane and counted out of their respective windows. A second experienced observer (another crew member) sat in a rear seat either behind the pilot-biologist or behind the first crew member for the double-observer protocol. The two crew members rotated their seat positions throughout the survey so crew member detection could be evaluated independently of seat position. Observers (pilot-biologist and crew members) recorded the species (or taxonomic family when species identification was not possible), number of individuals in the group (one or more), and the GPS location. During post hoc data processing, we grouped double-observer records that were recorded within 10 s of each other. We chose this temporal cutoff to accommodate differences in visibility between observers and potential lags in recording time. For example, front observers could see further ahead of the aircraft than the rear observers, and this visibility difference may have produced recording lags for the rear observer.
Thus, the 10-s window limited double-observer records to those most likely to contain matching records. Grouped double-observer records were then classified as: Species + Count Match -count and species identification matched between observer records, Generic + Count Match -count and taxonomic family matched between observer records, Species + Bin Match -log10 count bin (i.e., 0, 1-10, 11-100, 101-1000, and 1000+) and species identification matched between observer records (after count matches accounted for), Generic + Bin Match -log10 count bin (i.e., 0, 1-10, 11-100, 101-1000, and 1000+) and taxonomic family matched between observer records (after count matches accounted for), Species Only Match-species identification matched but neither count nor count bin matched between observer records, Generic Only Match-species taxonomic family matched but neither count nor count bin matched between observer records, Mismatchspecies did not match between observer records, and No Matchthere was no observation from the other observer recorded within 10 s. We note that the use of the term "generic" is meant in the generic sense to be interpreted as "general," not in the taxonomic sense to be interpreted as "genera." This double-observer protocol and data processing procedure allowed us to identify potential errors, including nondetection, counting error, and misidentification.

| Online quiz
We conducted an online survey to evaluate observer counting errors with known group-size data. We designed the group (flock) counting quiz using Qualtrics survey software. The design and content of the quiz were adapted from the U.S. Fish and Wildlife Service Aerial Observer Training and Testing Resources (https://www.fws. gov/water fowls urvey s/). We distributed the quiz via email to ~100 trained aerial observers (including those from the GoMMAPPS project) and biologists with no aerial survey observer experience.
Seventy-eight individuals completed the online quiz. The quiz consisted of background questions regarding respondents' level of experience conducting aerial bird surveys (Expert, Intermediate, Novice, No Experience; Appendix S2, Table A1) and confidence in their flock counting skills (High, Medium, Low; Appendix S2, Table   A2). The flock counting portion of the quiz consisted of two practice images and 22 timed quiz images of known-size flocks (Appendix S2, Table A3 and Figure A1) that were representative of flock sizes observed during GoMMAPPS surveys (Appendix S2, Figure A2). Each image was displayed for 10 s before it disappeared, and the quiz automatically advanced to a question asking how many birds were in the image. See Appendix S2 for additional details.

| Literature review
Of the 108 items returned by the Web of Science search and the 250 items returned by the Google Scholar search, 102 papers that were published from 1974 to 2020 met our inclusion criteria (Appendix S1). Although the number of peer-reviewed publications using aerial survey methods has increased over time, papers focused specifically on methodology of aerial surveys have not exhibited the same trend The issues identified in the literature were present in our waterbird case study, highlighting the high likelihood that these issues are prevalent in most aerial surveys even if they are not reported. We report results from both the literature review and our case study on each of the issues of nondetection, counting error, and species misidentification in the following sections.

| Analyses of aerial survey challenges
3.2.1 | Nondetection: failure to record individuals when they are present In the GoMMAPPS survey data, approximately 36% of observations recorded by one observer were missed by the other observer on the same side of the aircraft (Table 1: No Match across all surveys). Of these missed observations, most were single individuals (77.7%, N = 1503), and frequency of missed observations decreased TA B L E 1 Summary of data matches between two observers recording data on the same side of an aerial survey for each season of the Gulf of Mexico Marine Assessment Program for Protected Species (GoMMAPPS) surveys. We grouped double-observer records that were recorded within 10 s of each other and classified these records into categories based on the following criteria: Species + Count Match -count and species identification matched between observer records, Generic + Count Match -count and taxonomic family matched between observer records, Species + Bin Match -log10 count bin (i.e., 0, 1-10, 11-100, 101-1000, and 1000+) and species identification matched between observer records (after count matches accounted for), Generic + Bin Match -log10 count bin (i.e., 0, 1-10, 11-100, 101-1000, and 1000+) and taxonomic family matched between observer records (after count matches accounted for), Species Only Matchspecies identification matched but neither count nor count bin matched between observer records, Generic Only Match-species taxonomic family matched but neither count nor count bin matched between observer records, Mismatch-species did not match between observer records, and No Match-there was no observation from the other observer recorded within 10 s. For the purposes of this study, the identifications of "gull" and "tern" were included in the species-level identifications described above, and these identifications were pooled under the family Laridae for higher-level generic identifications with increasing group size. Of the 36% missed observations, 8% were the result of one observer recording more species present than the other observer (N = 158). We note that our "No Match" results include instances where a bird record was available to be counted for one observer and not the other. In certain instances, the movement of the plane resulted in birds flushing from the flight transect, which could have resulted in them having been recorded by one observer (likely front observer) but missed by the other (likely rear observer). Thus, we recognize that our results represent a "worst-case scenario" for missed observations.
We also compared naïve detection probabilities across crew members for each survey event, where we calculated the proportion of records that were matched between pairs of double observers (all but No Match category). Naïve detection probabilities of groups of waterbirds were highly variable among individual observers and across survey events ( ing "correction factors" from "sightability models" (Caughley et al., 1976). These sightability models considered survey variables, such as flight height, flight speed, and observation strip width to calculate a correction factor that was then applied to the aerial count data to correct for visibility biases. Although this method is less common in more recent literature (2000-present), correction factors continue to be used. The primary reason that the use of correction factors has declined is because such models are cumbersome to implement over varying conditions, heterogeneous landscapes, and across multiple species as a different model/correction factor is required for each scenario (Steinhorst & Samuel, 1989). For example, based on the nondetection errors we uncovered in the GoMMAPPS data, we would need to model a correction factor for each region, survey event, and observer, and we would likely need to account for different visibility conditions, as well. Thus, the correction factor approach can become untenable when many factors contribute to nondetection errors.
In the 1990s, other statistical techniques were introduced to formally address issues of nondetection in aerial count data (Quang & Becker, 1997), including distance sampling and reconciled doubleobserver methods. In distance sampling (sometimes referred to as "line transect sampling" in the aerial survey literature [Quang & Becker, 1997]), observers move along a transect line and record the distance to detected animals. The recorded distances are used to fit a detection function that describes the change in detection probability as a function of distance from transect and is used to estimate the proportion of animals not detected (Buckland et al., 2001). Reconciled double-observer methods exploit mark-recapture methods (often called the "double-count technique" in early aerial survey literature (Graham & Bell, 1989), where two observers independently record the number of detected animals and agree on which animals were detected by both observers. The first observer "marks" and "releases" certain animals while the second observer "recaptures" the animals. This creates a two-occasion capture history that can be used to estimate the number of missed animals (Graham & Bell, 1989). These methods are an improvement over use of correction factors because they allow researchers to model detection as a dynamic variable across heterogeneous environments and visibility conditions, as well as estimate uncertainty around detection probability (Walter & Hone, 2003).
In the last decade, researchers have combined methods for estimating detection from double-observer mark-recapture and distance sampling into a single model (Burt et al., 2014). This approach uses the strengths of both distance sampling and mark-recapture sampling to fit a detection function where the shape of the function is estimated with distance sampling methods and the intercept of the function is estimated using the mark-recapture data (Laake et al., 2008 (Chabot & Francis, 2016;Žydelis et al., 2019). Although widely used, in-flight observer counts are often biased (Caughley, 1977;Jolly, 1969), with variability among observers and a tendency to underestimate group size (Chabot & Francis, 2016), particularly for large groups (Buckland et al., 2012;Frederick et al., 2003).
In the GoMMAPPS survey data, flock counts varied between same-side observers, with the magnitude of differences between front-and rear-observer counts increasing with flock size (Figure 3a). At large flock sizes (200-1000), observer counts ranged from 40% to 150% of the true flock size, which is consistent with previous studies (Frederick et al., 2003). On average for the large flocks (>100), observer counts were 35-48% of the true flock size. However, even at small flock sizes (<100), mean observer errors were as high as 30%.

F I G U R E 3 (a) Counts of waterbirds from front and rear
Observers most frequently under-counted flock sizes, a tendency that increased with flock size with 50-70% of all observers underestimating flock size some or all the time when the true size was above 30 individuals. Observer experience and confidence in counting skills had no effect on observer ability to correctly enumerate flock size in the online quiz (Appendix S2, Figures A6-A7).
As most aerial survey research treats counting error as a nondetection issue, the solutions for handing nondetection errors are generally applicable to counting errors. However, recent advances in hierarchical modeling have made it possible to partition nondetection errors from counting errors. Clement et al. (2017) combined a mark-recapture distance sampling model with an N-mixture model (Royle, 2004) to separately account for nondetection and counting errors. Under this approach, observers independently recorded counts of observed groups in addition to the detection history and distance data collected for a mark-recapture distance sampling model. Combining the three sampling methods into a single hierarchical model allows for unbiased abundance estimates when observers imperfectly detect individuals due to nondetection errors, as well as counting error (Clement et al., 2017). A limitation to this model is that it requires a double-observer protocol, which may be costly or impractical for some survey efforts, and it also requires distance sampling, which is not feasible in all survey situations (such as the GoMMAPPS surveys).
Another potential solution for handling counting errors is to use ordinal modeling (Guisan & Harrell, 2000). In this approach, count data are binned into categories (e.g., 0, 1-10 individuals, 11-50 individuals), and the probability of obtaining a certain category is then modeled instead of the counts directly (Guisan & Harrell, 2000). The appropriate bin breaks may be based on a log scale or another scale based on magnitude of observer error (Figure 3) or where natural breaks occur in the data (Valle et al., 2019). Modeling-binned count data rather than the counts themselves may alleviate potential concerns regarding inferences based on counts with errors and allow for the estimation of uncertainty around the probability of ordinal classifications (Fitzgerald et al., 2021). Additionally, if ordinal modeling approaches are comparable to or better than the typical approach of using a count distribution (such as the negative binomial) to model abundance (Zipkin et al., 2014), collecting count data on a categorical scale may limit the training and time needed to for data collection. Count data may also be binned after field data collection if concerns arise regarding accuracy of recorded counts.

| Misidentification: failure to correctly identify individuals
Wildlife survey data are often analyzed without consideration of species identification errors, despite evidence that identification errors occur regularly (Conn et al., 2013). Indeed, only 10 out of the 102 (~10%) papers directly dealt with species misidentification.
Papers that reported issues associated with species misidentification tended to focus on small-or medium-bodied animals that are difficult to clearly identify or even detect from the air (Greene et al., 2017;Lamprey et al., 2020). One paper addressed difficulties with age and sex classification in elk (Cervus canadensis); like species identification, age, and sex classification also requires observers to discern small details from the survey aircraft (Bender et al., 2003).
Although the GoMMAPPS survey observers were trained in waterbird species identification, our double-observer data indicated that only ~41% of the observations recorded by both observers contained matching species identifications ( were not identified to species-level (except for gulls and terns), comprised ~9% of the total records (Table 1: generic + count match, generic + bin match, and generic only match categories). Mismatched records, including individuals identified as different species by the two observers (where taxonomic family also did not match between double-observer records), comprised ~13% of the total records (Table 1). It is possible that some of these records are likely to be detection errors rather than misidentification errors, as we could not separate these issues. Nevertheless, given the findings of our study, it appears that species identification errors are likely to be present and possibly pervasive in multispecies aerial survey datasets, especially when similar species (e.g., similar in body size and/or coloration) co-occur as they often do in waterbirds.
Auxiliary data that contain species-level (or sex/age class) records are generally needed to correct identification errors. For most studies in our literature review, researchers were only able to report that identification errors existed because they had access to other, independent survey data in addition to the aerial survey data (e.g., ground-based [Laursen et al., 2008, Greene et al., 2017, Lamprey et al., 2020 and vessel-based [Johnston et al., 2015] surveys). However, auxiliary data are rarely available because it is costly and time consuming to obtain as it requires a second, simultaneous surveying effort. Furthermore, secondary surveys may suffer from the same errors present in aerial survey data. Because of misidentification errors, it may be inappropriate to model individual species, sex, or age class counts when identifying features are difficult to distinguish and secondary sources of data are unavailable.
No secondary sources of data were available to complement the GoMMAPPS aerial survey data that could be used to correct for potential misidentification errors. However, when we aggregated species records by higher-order taxonomic classifications (i.e., family), we found that the records complementarily identified by both observers (front and rear) increased by approximately 10% (Table 1: generic + count match, generic + bin match, and generic only match categories). This suggests that analyzing data at higher taxonomic levels, rather than species, may be a reasonable approach to overcome identification issues when similar species co-occur if specieslevel identification is not a primary goal.

| SYNTHE S IZING SOLUTI ON S: A PATH FORWARD
Our literature review and empirical case study reveal that issues of nondetection, counting error, and species misidentification are prevalent in aerial survey count data. The extent to which each of these issues may bias inferences depends on the unique circumstances of individual survey efforts, including the frequency and severity of the errors as well as the goals of the survey and the subsequent analyses of the data. The ideal approach for mitigating potential biases from aerial survey data will vary based on the specific questions asked and which issue(s) are likely to occur with the particular survey con-

| Survey planning
Selecting an appropriate sampling framework for particular research question(s) or management objective(s) is paramount for choosing an effective study design that will result in reliable inferences. However, aerial surveys must balance a number of logistical and practical considerations with the scientific goal(s) of the study.
Logistical considerations, such as defining the spatial extent of sampling and determining the appropriate configuration of aerial sampling units influence survey cost and efficiency and contribute to what sampling methodologies are feasible (Caughley, 1977;Gibbs et al., 1998). Standardizing sampling methodologies across survey events, including the survey design as well as data collection protocols, is important to ensure that indices of abundance or distribution of species are comparable across years/seasons.
These design considerations also influence the types of research questions that are possible to address. Understanding the effects of environmental variables on species abundance requires a great deal of survey data, and if this is the goal, researchers should prioritize sampling a range of environmental conditions many times (Zipkin et al., 2015). However, if the goal of the survey is to estimate abundance of a species (or a group of species), researchers may consider using probabilistic sampling strategies rather than stratifying sampling units across the full range of environmental variables of interest. In many cases, aerial surveys are used to estimate indices of abundance or distribution that are used to track changes over time, and if this is the goal of the survey, researcher efforts should focus on maintaining consistency in design, personnel, and protocols over time to minimize observer errors related to changes in the survey.
When designing an aerial survey, researchers are encouraged to carefully consider their research goals and the extent to which survey design can be used to either mitigate or elucidate nondetection, counting, and misidentification errors.

| Sampling methodology
Our literature review revealed that distance sampling is the most popular framework for collecting aerial count data and modeling abundance with a detection probability. However, we suggest this approach only be used when observers have adequate time to record distances (or distance bands) and when the survey targets a small number of species. One solution could be using high resolution photography or video in addition to or instead of in-flight observers which would allow for a number of different analytical methods for estimating abundance and detection probability. However, in the presence of other errors (e.g., counting and species identification errors), estimating absolute abundance may be misguided as relative abundance indices may be the only obtainable parameter.
The double-observer method can be used to reconcile disparate observer data during data collection if observers work together to agree on what was observed (Quang & Becker, 1997 Pooling count data to create binned categories decreases the resolution of available data but may more accurately reflect the true uncertainty regarding the precision of the survey count data (Guisan & Harrell, 2000;Valle et al., 2019). Our GoMMAPPS analyses suggest that binning counts is beneficial for group sizes as small as 6-30 individuals and certainly for group sizes reaching hundreds or thousands of individuals. Although ordinal modeling is used relatively infrequently in ecology, this framework offers a promising alternative to modeling exact counts and can reflect uncertainty in count data when counting errors may be present. If species-level inferences are required, researchers could explore data integration with publicly available datasets (e.g., eBird, iNaturalist). Data integration, or modeling that incorporates multiple, dissimilar data types, (e.g., count data and presence/absence data) can yield more detailed information about a process of interest, including more accuracy and precision in estimates, than an analysis using a single data source (Zipkin et al., 2019).

| Digital data collection and future directions
The last decade of aerial survey research has seen a rise in digital data collection methods, including photography and video collected by drones and unstaffed aerial vehicles (Corcoran et al., 2021;Nowak et al., 2019). These technologies have the advantage of being less expensive than traditional staffed flights as well as being safer for research personnel as they do not require in-flight observations. However, a drawback to unstaffed aerial vehicles is that it is not possible to cover as large of a spatial area as quickly as can be done in a traditional staffed flight. Nevertheless, photo and video observations typically produce higher quality abundance and density estimates than traditional in-flight observer methods (Buckland et al., 2012;Chabot & Francis, 2016). After data are collected, photographs and videos may be reviewed by numerous observers which can allow researchers to utilize a number of different methods for estimating detection probability, as well as identifying counting and identification errors. However, despite these advantages, this technology is not immune to the previously discussed issues. Manual image or video classification is subject to the same human errors of nondetection, counting error, and especially species misidentification that in-flight observers experience (Chabot & Francis, 2016). Although photos and videos may be proofed multiple times, this is time-consuming and potentially costly. High resolution photography and videography is undoubtedly helpful in resolving counting errors, but imagery must be of high enough quality that distinguishing features can be discerned to differentiate among similar species.
Digital object classification (i.e., machine learning) offers a promising way forward for handling the time-intensive data processing required of digital data collection. Methods for automating object classification have improved in recent years and are already useful for reducing nondetection and counting errors (Torney et al., 2019), but automated species identification is more challenging (Chabot & Francis, 2016;Villon et al., 2020). Future work on digital object classification presents an opportunity to engage the public to help classify images that can be used as training data for classification algorithms (Torney et al., 2019), broadening the impact of research beyond the study system itself (Adler et al., 2020). Although digital methods may help to combat some of the human errors observed in the literature (including our own work), these technologies may also suffer some of the same shortcomings as count data collected by in-flight observers. Thus, the suggestions presented in this paper should be useful for combatting errors in count data collected both by human observers and digital methods.

| CON CLUS IONS
Imperfect detection can manifest as nondetection, counting error, and species misidentification, and all these sources of error should be considered when collecting and analyzing aerial survey data.
Although counting error and species misidentification have received comparatively limited attention (and thus fewer solutions) relative to nondetection, errors generated by all three sources are present and likely prevalent in aerial survey count data. Ignoring these errors or neglecting to address them explicitly could lead to biased or misleading inferences. Researchers should be aware that these issues exist and take measures to combat them during the design, data collection, and analysis stages of a study. Recognizing the conditions that can lead to data collection errors can allow researchers to allocate resources toward minimizing potential errors or invest more resources toward goals with fewer perceived challenges.

ACK N OWLED G M ENTS
The authors thank the observers and pilot-biologists for their hard work collecting the GoMMAPPS survey data. They also thank Alex

CO N FLI C T O F I NTE R E S T
The authors declare that they have no known competing financial or personal relationships that could have influenced the work reported in this paper.

O PE N R E S E A RCH BA D G E S
This article has earned an Open Data Badge for making publicly available the digitally-shareable data necessary to reproduce the reported results. The data is available at https://doi.org/10.5281/ zenodo.6038240.